25 research outputs found
On the Evaluation of Tweet Timeline Generation Task
Tweet Timeline Generation (TTG) task aims to generate a timeline of relevant but novel tweets that summarizes the development of a given topic. A typical TTG system first retrieves tweets then detects novel tweets among them to form a timeline. In this paper, we examine the dependency of TTG on retrieval quality, and its effect on having biased evaluation. Our study showed a considerable dependency, however, ranking systems is not highly affected if a common retrieval run is used. Springer International Publishing Switzerland 2016.Scopu
BroDyn’18: Workshop on analysis of broad dynamic topics over social media
This book constitutes the refereed proceedings of the 40th European Conference on IR Research, ECIR 2018, held in Grenoble, France, in March 2018.
The 39 full papers and 39 short papers presented together with 6 demos, 5 workshops and 3 tutorials, were carefully reviewed and selected from 303 submissions. Accepted papers cover the state of the art in information retrieval including topics such as: topic modeling, deep learning, evaluation, user behavior, document representation, recommendation systems, retrieval methods, learning and classication, and micro-blogs
CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media
We describe the third edition of the CheckThat! Lab, which is part of the
2020 Cross-Language Evaluation Forum (CLEF). CheckThat! proposes four
complementary tasks and a related task from previous lab editions, offered in
English, Arabic, and Spanish. Task 1 asks to predict which tweets in a Twitter
stream are worth fact-checking. Task 2 asks to determine whether a claim posted
in a tweet can be verified using a set of previously fact-checked claims. Task
3 asks to retrieve text snippets from a given set of Web pages that would be
useful for verifying a target tweet's claim. Task 4 asks to predict the
veracity of a target tweet's claim using a set of Web pages and potentially
useful snippets in them. Finally, the lab offers a fifth task that asks to
predict the check-worthiness of the claims made in English political debates
and speeches. CheckThat! features a full evaluation framework. The evaluation
is carried out using mean average precision or precision at rank k for ranking
tasks, and F1 for classification tasks.Comment: Computational journalism, Check-worthiness, Fact-checking, Veracity,
CLEF-2020 CheckThat! La
Automated Fact-Checking for Assisting Human Fact-Checkers
The reporting and analysis of current events around the globe has expanded
from professional, editor-lead journalism all the way to citizen journalism.
Politicians and other key players enjoy direct access to their audiences
through social media, bypassing the filters of official cables or traditional
media. However, the multiple advantages of free speech and direct communication
are dimmed by the misuse of the media to spread inaccurate or misleading
claims. These phenomena have led to the modern incarnation of the fact-checker
-- a professional whose main aim is to examine claims using available evidence
to assess their veracity. As in other text forensics tasks, the amount of
information available makes the work of the fact-checker more difficult. With
this in mind, starting from the perspective of the professional fact-checker,
we survey the available intelligent technologies that can support the human
expert in the different steps of her fact-checking endeavor. These include
identifying claims worth fact-checking; detecting relevant previously
fact-checked claims; retrieving relevant evidence to fact-check a claim; and
actually verifying a claim. In each case, we pay attention to the challenges in
future work and the potential impact on real-world fact-checking.Comment: fact-checking, fact-checkers, check-worthiness, detecting previously
fact-checked claims, evidence retrieva
EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets
This article introduces a new language-independent approach for creating a
large-scale high-quality test collection of tweets that supports multiple
information retrieval (IR) tasks without running a shared-task campaign. The
adopted approach (demonstrated over Arabic tweets) designs the collection
around significant (i.e., popular) events, which enables the development of
topics that represent frequent information needs of Twitter users for which
rich content exists. That inherently facilitates the support of multiple tasks
that generally revolve around events, namely event detection, ad-hoc search,
timeline generation, and real-time summarization. The key highlights of the
approach include diversifying the judgment pool via interactive search and
multiple manually-crafted queries per topic, collecting high-quality
annotations via crowd-workers for relevancy and in-house annotators for
novelty, filtering out low-agreement topics and inaccessible tweets, and
providing multiple subsets of the collection for better availability. Applying
our methodology on Arabic tweets resulted in EveTAR , the first
freely-available tweet test collection for multiple IR tasks. EveTAR includes a
crawl of 355M Arabic tweets and covers 50 significant events for which about
62K tweets were judged with substantial average inter-annotator agreement
(Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating
existing algorithms in the respective tasks. Results indicate that the new
collection can support reliable ranking of IR systems that is comparable to
similar TREC collections, while providing strong baseline results for future
studies over Arabic tweets
Overview of CheckThat! 2020: Automatic Identification and Verification of Claims in Social Media
We present an overview of the third edition of the CheckThat! Lab at CLEF
2020. The lab featured five tasks in two different languages: English and
Arabic. The first four tasks compose the full pipeline of claim verification in
social media: Task 1 on check-worthiness estimation, Task 2 on retrieving
previously fact-checked claims, Task 3 on evidence retrieval, and Task 4 on
claim verification. The lab is completed with Task 5 on check-worthiness
estimation in political debates and speeches. A total of 67 teams registered to
participate in the lab (up from 47 at CLEF 2019), and 23 of them actually
submitted runs (compared to 14 at CLEF 2019). Most teams used deep neural
networks based on BERT, LSTMs, or CNNs, and achieved sizable improvements over
the baselines on all tasks. Here we describe the tasks setup, the evaluation
results, and a summary of the approaches used by the participants, and we
discuss some lessons learned. Last but not least, we release to the research
community all datasets from the lab as well as the evaluation scripts, which
should enable further research in the important tasks of check-worthiness
estimation and automatic claim verification.Comment: Check-Worthiness Estimation, Fact-Checking, Veracity, Evidence-based
Verification, Detecting Previously Fact-Checked Claims, Social Media
Verification, Computational Journalism, COVID-1
Benchmarking Arabic AI with Large Language Models
With large Foundation Models (FMs), language technologies (AI in general) are
entering a new paradigm: eliminating the need for developing large-scale
task-specific datasets and supporting a variety of tasks through set-ups
ranging from zero-shot to few-shot learning. However, understanding FMs
capabilities requires a systematic benchmarking effort by comparing FMs
performance with the state-of-the-art (SOTA) task-specific models. With that
goal, past work focused on the English language and included a few efforts with
multiple languages. Our study contributes to ongoing research by evaluating FMs
performance for standard Arabic NLP and Speech processing, including a range of
tasks from sequence tagging to content classification across diverse domains.
We start with zero-shot learning using GPT-3.5-turbo, Whisper, and USM,
addressing 33 unique tasks using 59 publicly available datasets resulting in 96
test setups. For a few tasks, FMs performs on par or exceeds the performance of
the SOTA models but for the majority it under-performs. Given the importance of
prompt for the FMs performance, we discuss our prompt strategies in detail and
elaborate on our findings. Our future work on Arabic AI will explore few-shot
prompting, expand the range of tasks, and investigate additional open-source
models.Comment: Foundation Models, Large Language Models, Arabic NLP, Arabic Speech,
Arabic AI, , CHatGPT Evaluation, USM Evaluation, Whisper Evaluatio
Antimicrobial resistance among migrants in Europe: a systematic review and meta-analysis
BACKGROUND: Rates of antimicrobial resistance (AMR) are rising globally and there is concern that increased migration is contributing to the burden of antibiotic resistance in Europe. However, the effect of migration on the burden of AMR in Europe has not yet been comprehensively examined. Therefore, we did a systematic review and meta-analysis to identify and synthesise data for AMR carriage or infection in migrants to Europe to examine differences in patterns of AMR across migrant groups and in different settings. METHODS: For this systematic review and meta-analysis, we searched MEDLINE, Embase, PubMed, and Scopus with no language restrictions from Jan 1, 2000, to Jan 18, 2017, for primary data from observational studies reporting antibacterial resistance in common bacterial pathogens among migrants to 21 European Union-15 and European Economic Area countries. To be eligible for inclusion, studies had to report data on carriage or infection with laboratory-confirmed antibiotic-resistant organisms in migrant populations. We extracted data from eligible studies and assessed quality using piloted, standardised forms. We did not examine drug resistance in tuberculosis and excluded articles solely reporting on this parameter. We also excluded articles in which migrant status was determined by ethnicity, country of birth of participants' parents, or was not defined, and articles in which data were not disaggregated by migrant status. Outcomes were carriage of or infection with antibiotic-resistant organisms. We used random-effects models to calculate the pooled prevalence of each outcome. The study protocol is registered with PROSPERO, number CRD42016043681. FINDINGS: We identified 2274 articles, of which 23 observational studies reporting on antibiotic resistance in 2319 migrants were included. The pooled prevalence of any AMR carriage or AMR infection in migrants was 25·4% (95% CI 19·1-31·8; I2 =98%), including meticillin-resistant Staphylococcus aureus (7·8%, 4·8-10·7; I2 =92%) and antibiotic-resistant Gram-negative bacteria (27·2%, 17·6-36·8; I2 =94%). The pooled prevalence of any AMR carriage or infection was higher in refugees and asylum seekers (33·0%, 18·3-47·6; I2 =98%) than in other migrant groups (6·6%, 1·8-11·3; I2 =92%). The pooled prevalence of antibiotic-resistant organisms was slightly higher in high-migrant community settings (33·1%, 11·1-55·1; I2 =96%) than in migrants in hospitals (24·3%, 16·1-32·6; I2 =98%). We did not find evidence of high rates of transmission of AMR from migrant to host populations. INTERPRETATION: Migrants are exposed to conditions favouring the emergence of drug resistance during transit and in host countries in Europe. Increased antibiotic resistance among refugees and asylum seekers and in high-migrant community settings (such as refugee camps and detention facilities) highlights the need for improved living conditions, access to health care, and initiatives to facilitate detection of and appropriate high-quality treatment for antibiotic-resistant infections during transit and in host countries. Protocols for the prevention and control of infection and for antibiotic surveillance need to be integrated in all aspects of health care, which should be accessible for all migrant groups, and should target determinants of AMR before, during, and after migration. FUNDING: UK National Institute for Health Research Imperial Biomedical Research Centre, Imperial College Healthcare Charity, the Wellcome Trust, and UK National Institute for Health Research Health Protection Research Unit in Healthcare-associated Infections and Antimictobial Resistance at Imperial College London
Surgical site infection after gastrointestinal surgery in high-income, middle-income, and low-income countries: a prospective, international, multicentre cohort study
Background: Surgical site infection (SSI) is one of the most common infections associated with health care, but its importance as a global health priority is not fully understood. We quantified the burden of SSI after gastrointestinal surgery in countries in all parts of the world.
Methods: This international, prospective, multicentre cohort study included consecutive patients undergoing elective or emergency gastrointestinal resection within 2-week time periods at any health-care facility in any country. Countries with participating centres were stratified into high-income, middle-income, and low-income groups according to the UN's Human Development Index (HDI). Data variables from the GlobalSurg 1 study and other studies that have been found to affect the likelihood of SSI were entered into risk adjustment models. The primary outcome measure was the 30-day SSI incidence (defined by US Centers for Disease Control and Prevention criteria for superficial and deep incisional SSI). Relationships with explanatory variables were examined using Bayesian multilevel logistic regression models. This trial is registered with ClinicalTrials.gov, number NCT02662231.
Findings: Between Jan 4, 2016, and July 31, 2016, 13 265 records were submitted for analysis. 12 539 patients from 343 hospitals in 66 countries were included. 7339 (58·5%) patient were from high-HDI countries (193 hospitals in 30 countries), 3918 (31·2%) patients were from middle-HDI countries (82 hospitals in 18 countries), and 1282 (10·2%) patients were from low-HDI countries (68 hospitals in 18 countries). In total, 1538 (12·3%) patients had SSI within 30 days of surgery. The incidence of SSI varied between countries with high (691 [9·4%] of 7339 patients), middle (549 [14·0%] of 3918 patients), and low (298 [23·2%] of 1282) HDI (p < 0·001). The highest SSI incidence in each HDI group was after dirty surgery (102 [17·8%] of 574 patients in high-HDI countries; 74 [31·4%] of 236 patients in middle-HDI countries; 72 [39·8%] of 181 patients in low-HDI countries). Following risk factor adjustment, patients in low-HDI countries were at greatest risk of SSI (adjusted odds ratio 1·60, 95% credible interval 1·05–2·37; p=0·030). 132 (21·6%) of 610 patients with an SSI and a microbiology culture result had an infection that was resistant to the prophylactic antibiotic used. Resistant infections were detected in 49 (16·6%) of 295 patients in high-HDI countries, in 37 (19·8%) of 187 patients in middle-HDI countries, and in 46 (35·9%) of 128 patients in low-HDI countries (p < 0·001).
Interpretation: Countries with a low HDI carry a disproportionately greater burden of SSI than countries with a middle or high HDI and might have higher rates of antibiotic resistance. In view of WHO recommendations on SSI prevention that highlight the absence of high-quality interventional research, urgent, pragmatic, randomised trials based in LMICs are needed to assess measures aiming to reduce this preventable complication